#### Chips Alliance December 2022 Tech Update

# Dataset for ML-Guided Chip Design

Aman Arora
Ph.D. Candidate and Graduate Fellow
Department of Electrical and Computer Engineering
The University of Texas at Austin







Thanks to Zhigang Wei who is doing the hands-on work for this project.

Thanks to our advisor Prof. Lizy John.



# Using ML for Chip Design



PowerGear: Early-Stage Power Estimation in FPGA HLS via Heterogeneous Edge-Centric GNNs

Zhe Lin<sup>1</sup>, Zike Yuan<sup>2†</sup>, Jieru Zhao<sup>3</sup>, Wei Zhang<sup>4</sup>, Hui Wang<sup>1</sup> and Yonghong Tian<sup>1,5\*</sup>

<sup>1</sup>Peng Cheng Laboratory, China <sup>2</sup>The University of Auckland, New Zealand <sup>3</sup>Shanghai Jiao Tong University, China <sup>4</sup>The Hong Kong University of Science and Technology, Hong Kong, China <sup>5</sup>Peking University, China {linzh01,wangh06,tianyh}@pcl.ac.cn, zyua138@aucklanduni.ac.nz, zhao-jieru@sjtu.edu.cn, wei.zhang@ust.hk

ceyan Sankaralingam Xiaojin Zhu Madison

, γιμοτιτι, φοτοιιγωσοσ.αισλαο.σαα



## Need for Open-Source Datasets

Each project requires datasets to train NN models

Creating datasets is very time consuming and expensive

- Tools, Licenses
- Scripts to run and parse
- Machines
- Curation

#### Datasets used by existing projects

- Proprietary and not available in open-source
- Created on adhoc basis for a specific problem leading to many custom datasets



## Need for Open-Source Datasets

#### **Dataset contents**

- Graphs of netlists of HDL designs
- Performance counters of C applications running on hardware
- Signal activity with a netlist graph
- Power consumption (estimated from a tool and measured on a board)
- FPGA resource usage and timing
- 2D images of floor-planned, placed, routed circuits

Several projects can share/reuse data

Open-source datasets for chip design would be very useful to the research community

- Very recently (Oct 2022), one dataset released in open-source
  - CircuitNet (from Peking University)



# Chip Design Data Set (CD<sup>2</sup>S)

#### HDL designs

- OpenCores
- VTR (& Koios)
- NVDLA
- Etc

#### C applications

- Polybench
- CHStone
- Machsuite
- Etc

#### Features:

- Number/size of primary inputs and outputs
- Number of arithmetic (multiply, add, etc.) and logical operators (and, xor, etc.)
- Number of memory bits
- Size of the design (netlist primitives in a non-tech mapped netlist)
- Application domain (signal processing, machine learning, general purpose processor, networking, etc.)
- Number of registers, signals, muxes, FSMs (for HDL designs)
- Number of basic blocks, conditionals, loops (for C applications)



# Chip Design Data Set (CD<sup>2</sup>S)

#### Metrics:

- Area (resource usage)
- Power
- Wire length
- Operating frequency

#### For:

- Multiple FPGA devices from multiple FPGA vendors
- Multiple ASIC libraries/PDKs

#### For:

- Multiple implementation settings (HLS pragmas, gate-level synthesis options)
- Multiple process corners



# Case Study





## Case Study





### Status

#### Link

https://tiny.one/gocd2s

#### Current focus

FPGA

#### Numbers

- HDL designs 2348
- Generated designs from C applications 4580

#### Funding

- Meta/Facebook
- 1 student



## Next Steps

#### More data

- FPGA flow
  - C dataset
    - Collecting data for Machsuite and CHStone benchmarks
  - Verilog dataset
    - Parsing contents from Yosys reports into CSV
    - Running with VTR and Vivado and parsing reports
- ASIC flow

#### Bringing to Chips Alliance

- Submitting as a sandbox project soon
- Call to contribute
- Writing scripts, running tools, and parsing data



## Summary



## Thanks!

Zhigang Wei (zw5259@utexas.edu)

Aman Arora (aman.kbm@utexas.edu)